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A  TRUST  REGION  FRAMEWORK  FOR  MANAGING  THE  USE  OF  APPROXIMATION 
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Abstract.  This  paper  presents  an  analytically  robust,  globally  convergent  approach  to  managing  the 
use  of  approximation  models  of  various  fidelity  in  optimization.  By  robust  global  behavior  we  mean  the 
mathematical  assurance  that  the  iterates  produced  by  the  optimization  algorithm,  started  at  an  arbitrary 
initial  iterate,  will  converge  to  a  stationary  point  or  local  optimizer  for  the  original  problem.  The  approach 
we  present  is  based  on  the  trust  region  idea  from  nonlinear  programming  and  is  shown  to  be  provably 
convergent  to  a  solution  of  the  original  high-fidelity  problem.  The  proposed  method  for  managing  approx¬ 
imations  in  engineering  optimization  suggests  ways  to  decide  when  the  fidelity,  and  thus  the  cost,  of  the 
approximations  might  be  fruitfully  increased  or  decreased  in  the  course  of  the  optimization  iterations.  The 
approach  is  quite  general.  We  make  no  assumptions  on  the  structure  of  the  original  problem,  in  particular, 
no  assumptions  of  convexity  and  separability,  and  place  only  mild  requirements  on  the  approximations.  The 
approximations  used  in  the  framework  can  be  of  any  nature  appropriate  to  an  application;  for  instance,  they 
can  be  represented  by  analyses,  simulations,  or  simple  algebraic  models.  This  paper  introduces  the  approach 
and  outlines  the  convergence  analysis. 

Key  words,  approximation  concepts,  trust  region  methods,  surrogates 

Subject  classification.  Applied  and  Numerical  Mathematics 

1.  Introduction.  In  this  paper  we  present  an  approach  to  managing  the  use  of  approximation  models 
in  optimization  that  is  based  on  the  trust  region  approach  from  nonlinear  programming  [8,  17].  The  approach 
we  present  inherits  the  mathematical  robustness  and  global  and  local  convergence  properties  of  the  classical 
trust  region  methods.  By  global  convergence  we  mean  the  assurance  that  the  iterates  produced  by  an 
optimization  algorithm  working  with  the  approximation  models,  started  at  an  arbitrary  initial  iterate,  will 
converge  to  a  stationary  point  or  local  optimizer  for  the  original  problem.  The  local  convergence  rate 
determines  the  asymptotic  efficiency  of  the  method.  The  approach  we  present  also  suggests  criteria  to  decide 
when  the  fidelity  (and  thus  the  cost)  of  the  approximations  might  be  fruitfully  increased  or  decreased. 

The  use  of  approximations  in  engineering  optimization  motivates  this  work.  A  review  of  approximation 
models  in  structural  optimization  can  be  found  in  [2].  When  many  of  these  ideas  were  first  formalized, 
for  instance,  in  [22]  and  [24],  the  idea  was  to  employ  approximation  models  in  conjunction  with  existing 
mathematical  programming  techniques  to  solve  structural  design  optimization  problems.  However,  to  the 
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best  of  our  knowledge,  prior  analysis  in  the  structural  optimization  community  has  focussed  on  the  question 
of  whether  or  not  the  optimization  technique  would  converge  to  a  solution  of  the  problem  defined  by  the 
approximation  concept,  rather  than  the  original  problem  (e.g.,  [3,  15],  with  an  exception  being  [1,  18]). 
Because  the  method  we  propose  inherits  the  convergence  properties  of  the  classical  trust  region  algorithms 
for  nonlinear  optimization,  we  can  give  simple  conditions  which  assure  that  the  iterates  produced  by  using 
suitable  approximation  models,  starting  from  an  arbitrary  initial  iterate,  will  converge  to  a  stationary  point 
or  a  local  optimizer  of  the  original  problem.  The  analysis  easily  accommodates  varying  the  nature  of  the 
approximation  from  iteration  to  iteration. 

The  trust  region  framework  gives  an  adaptive  method  for  managing  the  amount  of  optimization  done 
with  the  approximation  models  before  one  has  recourse  to  a  detailed  model  to  check  the  validity  of  the 
design  generated  by  the  approximation  model.  This  regulation  is  based  on  the  ability  of  the  approximation 
to  predict  improvement  in  the  system  being  optimized.  Moreover,  by  comparing  the  improvement  predicted 
by  the  approximation  model  to  the  improvement  realized  for  the  true  system  being  optimized,  we  obtain 
useful  information  on  how  well  the  model  is  predicting  the  behavior  of  the  system.  This  information  can  be 
used  to  suggest  when  a  model  of  greater  or  lesser  fidelity  may  be  more  suitable  as  well  as  when  more  or  less 
optimization  might  be  done  on  the  model  before  the  next  comparison. 

In  this  paper  we  consider  only  the  case  of  unconstrained  minimization.  We  do  this,  in  part,  for  simplicity 
in  presenting  the  trust  region  approach.  A  discussion  of  trust  region  approaches  to  constrained  optimization, 
particularly  the  convergence  theory  for  constrained  algorithms,  would  require  the  introduction  of  technical 
machinery  that  would  obscure  the  points  we  wish  to  make.  Moreover,  many  nonlinear  programming  algo¬ 
rithms  for  constrained  optimization — penalty  methods,  classical  and  modified  barrier  methods,  augmented 
Lagrangian  methods — actually  proceed  by  solving  a  sequence  of  unconstrained  optimization  problems,  to 
which  the  current  discussion  applies.  The  case  of  constrained  optimization  we  will  treat  in  detail  elsewhere. 

Let  x  =  (as1,  *  •  •  ,xn)  denote  the  design  variables,  and  suppose  that  one  has  a  model  of  high  physical 
fidelity  but  high  computational  cost,  as  well  as  an  approximate  model  of  lower  physical  fidelity  but  lower 
computational  cost.  Let  the  associated  performance  measures  (merit /cost /objective  functions)  be  f(x)  and 
a(x)  and  their  sensitivities  (with  respect  to  the  design  variables)  be  V/(x)  and  Va(x): 


x 

x 


High-fidelity  (“true”)  model 


Approximate  model 


/(*),  v/(*) 

a(x),  Va(x). 


Fig.l  describes  a  conceptual  scheme  for  using  approximation  models  in  the  context  of  optimization.  One 
occasionally  uses  information  from  the  high-fidelity  model  to  check  designs  generated  using  a  model  of  lower 
fidelity  but  of  lower  computational  cost.  One  then  takes  a  number  of  optimization  iterations  using  this 
simpler,  cheaper  approximation  model.  At  the  end  of  this  optimization  phase,  one  has  recourse  to  the  high- 
fidelity  model  to  recalibrate  the  lower-fidelity  model  and  then  continues  optimization  using  the  simplified 
model. 

In  order  to  make  such  a  scheme  robust — that  is,  to  be  assured  that  we  are  converging  to  a  design  that  is 
likely  to  yield  at  least  a  local  optimum  for  the  original,  high-fidelity  problem — we  must  address  the  following 
questions. 

•  What  does  one  do  when  the  design  derived  from  optimization  using  the  approximation  model  a  fails 
to  produce  improvement  in  the  true  objective  /? 

•  More  generally,  how  can  one  use  information  about  the  predictive  value  of  a  (or  lack  thereof)  to 
regulate  the  amount  of  optimization  done  using  a  before  recourse  to  the  high-fidelity  model? 
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Optimization  on  simplifed  model 


FlG.  1.  Conceptual  optimization  algorithm  using  approximation  models. 

In  addition,  for  reasons  of  efficiency,  one  also  seeks  guidance  in  regards  to  the  following: 

•  When  might  it  be  appropriate  to  either  change  or  refine  the  model  to  improve  the  progress  of 
the  optimization?  When  can  the  quality  (and,  presumably,  cost)  of  the  approximation  model  be 
reduced? 

The  trust  region  mechanism  gives  a  systematic  response  to  both  poor  and  incorrect  prediction  on  the  part 
of  the  approximation  model  while  not  being  so  conservative  as  to  retard  progress  when  the  approximation 
model  does  a  good  job  of  predicting  improvement  in  the  high-fidelity  model.  Furthermore,  the  trust  region 
mechanism  gives  us  a  measure  of  how  well  the  model  is  predicting  improvement  in  the  system  and  thus 
suggests  criteria  for  changing  or  updating  the  model  based  on  this  measure  of  the  predictive  abilities  of  the 
model. 

The  approach  we  present  is  a  confluence  of  a  theoretically  and  practically  attractive  nonlinear  program¬ 
ming  methodology  with  a  widely  used  engineering  practice.  The  exclusive  focus  in  nonlinear  optimization 
has  been  on  the  use  of  local  models,  almost  always  quadratic,  with  various  approximations  to  first-  and 
second-order  sensitivities.  Our  contribution  here  is  the  observation  that  the  trust  region  framework  fits  nat¬ 
urally  with  the  idea  of  using  non-quadratic  approximations  found  in  engineering  optimization.  We  require 
the  approximations  only  to  satisfy  a  few  mild  conditions,  discussed  in  §3.  Mathematically,  the  trust  region 
framework  for  managing  approximations  is  a  straightforward  extension  of  the  classical  trust  region  theory. 

In  §2  we  present  the  relevant  features  of  classical  trust  region  algorithms.  In  §3  we  apply  the  classical  trust 
region  approach  to  manage  the  use  of  the  more  general  approximations  available  in  engineering  applications. 
In  §4  we  discuss  how  the  convergence  theory  for  classical  trust  region  algorithms  remains  valid  for  the  new 
approach  using  general  approximation  models.  In  §5  we  discuss  how  the  information  used  in  the  course  of 
the  the  classical  trust  region  approach  might  be  used  to  decide  when  it  might  be  appropriate  to  either  change 
or  refine  the  model  to  improve  the  progress  of  the  optimization.  Examples  of  approximations  that  can  be 
used  in  the  trust  region  framework  we  present  are  discussed  in  the  Appendix. 

2.  The  classical  trust  region  approach  in  optimization.  One  of  the  goals  of  modem  nonlinear 
programming  algorithms  is  robust  global  behavior.  By  robust  global  behavior  we  mean  the  mathematical 
assurance  that  the  iterates  produced  by  an  optimization  algorithm,  started  at  an  arbitrary  initial  iterate,  will 
converge  to  a  stationary  point  or  local  optimizer  for  the  problem.  This  robustness  is  achieved  by  globalization 
strategies  such  as  trust  regions,  line  searches,  and  continuation  methods. 

The  classical  trust  region  idea  is  to  regulate  the  length  of  the  steps  taken  in  an  iterative  optimization 
process  based  on  how  well  the  current  quadratic  Taylor  series  model  of  /  is  found  to  predict  improvement 
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in  /.  This  leads  to  an  adaptive  method  for  adjusting  the  size  of  the  steps  taken  based  on  how  well  the  local 
quadratic  models  are  predicting  decrease  in  /. 

At  iteration  fc,  one  begins  by  building  a  quadratic  model  qk  of  the  objective  function: 

(1)  f(xk  +  s)  »  qk{xk  +  s)  =  f(xk)  +  gl s  +  i sT Bks . 

We  use  here  the  notation  s  to  denote  the  prospective  step  Ax  in  the  design  variables,  gk  to  denote  an 
approximation  of  the  gradient  grad  f(x &),  and  the  term  B k  to  denote  a  model  of  the  second  derivatives 
(curvature)  of  /  at  xk -  The  convergence  analysis  of  trust  region  methods  in  the  unconstrained  case  places  very 
mild  requirements  on  the  information  used  to  construct  the  approximation  (1).  The  gradient  approximation 
g  can  be  of  limited  accuracy  [5],  while  the  Bk  need  only  remain  uniformly  bounded  in  norm.  Typically  Bk 
is  the  Hessian  of  /,  calculated  analytically  or  via  finite-differences,  or  a  quasi-Newton  Hessian  built  up  using 
a  secant  update  such  as  BFGS  or  DFP  [8], 

The  trust  region  algorithm  proceeds  by  building  and  minimizing  quadratic  models  of  the  form  (1). 
However,  in  general  such  a  quadratic  model  is  known  to  be  a  good  approximation  only  in  a  neighborhood 
of  Consequently,  we  restrict  the  step  we  take  to  a  region  in  which  we  trust  the  quadratic  model  to 
approximate  /  well,  whence  the  name  “trust  region.”  This  is  done  by  adding  a  constraint  on  the  length  of 
the  step  allowed,  resulting  in  the  trust  region  subproblem: 


(2) 


minimize  qk(xk  +  s) 
subject  to  ||  5  ||  <  Sk- 


Note  that  the  the  direction  of  the  step  may  vary  with  5k-  In  practice,  one  need  not  solve  the  problem  (2) 
exactly  for  the  step  Sk ;  there  is  a  relaxed  condition  on  how  much  decrease  Sk  must  produce  in  the  quadratic 
model  qk  in  order  to  insure  robust  behavior  [26,  17].  We  discuss  this  criterion,  the  Fraction  of  Cauchy 
Decrease  condition,  in  §4. 

Although  in  this  algorithm  we  use  an  l2  trust  region,  that  is,  the  bound  on  the  step  is  expressed  using 
the  Euclidean  norm,  this  assumes  an  even  scaling  of  all  the  components  of  x.  In  practice,  the  variables  are 
scaled  to  improve  performance;  this  leads  to  a  trust  region  of  the  form  ||  As  ||  <  Sk,  where  A  is  a  symmetric 
positive  definite  matrix.  Other  choices  of  norm  are  possible  to  define  the  trust  region  as  well.  For  instance, 
one  can  use  the  ^  norm,  which  is  more  appropriate  when  solving  problems  with  bounds  on  the  design 
variables  [11]. 

One  then  decides  whether  to  accept  the  prospective  step  s: 


(3) 


(  xk  +  s  if  f{xk  +  s)  <  f(xk) 
I  Xk  otherwise. 


The  trust  radius  8k  is  similar  in  purpose  to  a  move  limit  [30].  However,  the  two  are  distinguished  by  the 
way  in  which  they  are  updated.  Move  limits  are  set  and  updated  in  a  manner  based  on  the  intuition  of  the 
user.  From  the  point  of  view  of  mathematical  analysis  this  is  ad  hoc.  While  the  use  of  move  limits  in  this 
way  can  be  successful,  to  the  authors7  knowledge  no  proof  of  convergence  for  optimization  algorithms  that 
use  move  limits  exists.  On  the  other  hand,  in  trust  region  algorithms,  the  “move  limit”— the  trust  radius— is 
expanded  and  contracted  in  a  systematic  way  for  which  one  can  prove  global  and  local  convergence  results. 

In  particular,  after  each  optimization  iteration,  the  trust  radius  is  updated  in  an  adaptive  way  based  on 
the  predictive  quality  of  the  quadratic  model  used  to  generate  steps,  according  to  the  following  principles: 

1.  If  the  model  did  a  very  good  job  of  predicting  the  actual  improvement  of  /  or  if  there  was  even  more 
improvement  than  predicted  in  /,  then  increase  8k  and  allow  a  longer  step  at  the  next  optimization 
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iteration  k  - 1-1,  since  the  model  has  proven  its  utility  in  finding  improvement  in  /  over  the  current 
trust  region. 

2.  However,  if  the  model  did  a  bad  job  of  predicting  the  improvement  in  /,  either  because  /  actu¬ 
ally  increased  with  the  step  s,  or  because  /  did  decrease  but  not  nearly  as  much  as  predicted  by 
the  quadratic  model,  decrease  the  size  of  the  trust  region  used  in  the  next  optimization  iteration. 
Calculus  assures  us  that  the  quadratic  model  is  good  if  we  remain  sufficiently  close  to  xk. 

3.  Finally,  if  the  model  did  an  acceptable  but  not  especially  noteworthy  job  of  predicting  the  improve¬ 
ment  in  /,  leave  the  size  of  the  trust  region  alone. 

Numerically,  one  chooses  positive  constants  r\  <  r2  <  1  and  C\  <  1,  C2  >  1  that  regulate  the  expansion  and 
contraction  of  the  trust  region.  One  compares  the  actual  and  predicted  decrease, 

f(xk)  -  f(xk  +  s) 

7*  zzz  ■  1  . — 

f{xk)  -  qk(xk  +  s)’ 

and  updates  the  trust  radius  as  follows: 

{ci  ||  s  ||  if  r  <  ri 

min{c2  ||  Sk  || ,  A*}  if  r  >  r2 
||  s  ||  otherwise, 

where  A*  is  an  upper  bound  on  the  trust  radius.  Typical  values  for  r\  and  r2  are  ri  =  0.10  and  r2  =  0.75 

[8], 

Note  that  we  do  not  reduce  the  trust  region  if  the  quadratic  model  under-predicts  improvement  in  /.  In 
the  context  of  optimization,  we  focus  on  predicting  descent,  as  opposed  to  the  more  general  question  of  the 
overall  quality  of  the  approximation.  The  latter  issue  is  important,  however,  in  constrained  problems  if  one 
wishes  to  insure  feasibility  of  the  iterates  produced  by  an  optimization  algorithm. 

The  classical  trust  region  algorithm  is  summarized  in  Figure  2.  We  have  omitted  stopping  criteria;  for 
a  discussion  of  stopping  criteria  for  trust  region  methods,  see  [8,  12]. 

Choose  xq  €  IRn  and  <5o  >  0. 

For  k  =  0, 1, . . .  until  convergence  do  { 

Find  an  approximate  solution  sk  to  the  subproblem: 
minimize  qk{xk  +  s) 

subject  to  ||  s  ||  <  Sk. 

Compare  the  actual  and  predicted  decrease: 
r  -  /(xfc)  ~  f(xk  ±  sk) 
f(xk )  -  qk{xk  +  sky 

Update  xk  according  to  (3)  and  Sk  according  to  (4). 

} 

Fig.  2.  The  classical  trust  region  algorithm  for  unconstrained  minimization. 


3.  A  trust  region  approach  with  generalized  approximation  models.  In  the  engineering  opti¬ 
mization  literature,  the  quadratic  model  (1)  based  on  the  Taylor  series  for  /  is  sometimes  called  a  formal 
approximation.  Alternative  models  used  in  engineering  practice  can  produce  approximations  that  are  better 
than  the  quadratic  model  over  a  larger  neighborhood.  These  approximations  are  usually  based  on  some 
knowledge  of  the  problem  and  thus  are  specific  to  the  application  whereas  quadratic  models  are  always 
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locally  applicable.  Many  such  problem-specific  approximations  can  be  found  in  the  structural  optimiza¬ 
tion  literature;  we  discuss  some  examples  that  satisfy  the  conditions  of  the  trust  region  framework  in  the 
Appendix. 

We  place  the  following  two  requirements  on  the  approximation  model  used  at  each  optimization  iteration: 

(5)  ak{xk)  =  f(xk) 

(6)  grad  ak(xk)  =  grad  f(xk). 

If  the  model  ak  and  its  first  derivatives  at  xk  agree  with  those  of  the  actual  objective  /,  we  call  the  approx¬ 
imation  a  first-order  model.  If,  in  addition, 

(7)  grad2  ak(xk)  =  grad2  f{xk), 

we  call  the  approximation  a  second-order  model ,  though  these  are  not  the  primary  focus  of  this  paper  since 
it  is  typically  not  the  case  that  second-order  derivatives  of  /  are  available. 

In  the  case  of  unconstrained  minimization  we  actually  can  weaken  the  condition  (6)  and  develop  an 
approach  based  on  inexact  gradients  along  the  lines  of  [5],  but  for  simplicity  we  will  not  pursue  that  approach 
here.  One  can  also  develop  an  approach  using  zero-order  models  that  satisfy  only  the  condition  (5)  [7]. 

The  conditions  (5)-(6)  are  not  especially  restrictive  if  exact  or  approximate  sensitivity  information  for 
/  is  available.  In  the  Appendix  we  discuss  examples  of  approximations  that  satisfy  (5)-(6)  by  construction. 

The  conditions  (5)  and  (6)  guarantee  that  sufficiently  close  to  xk,  the  approximation  ak  is  a  good  model 
of  /.  It  is  then  clear  how  the  trust  region  approach  provides  a  mechanism  to  regulate  the  use  of  a  in  an 
optimization  iteration:  if  the  approximation  model  a  is  not  a  good  predictor  of  the  improvement  of  /  for  a 
long  step,  we  decrease  S  and  fall  back  on  a  region  in  which  a  is  an  increasingly  good  model  of  /.  On  the 
other  hand,  if  a  is  doing  a  good  job  of  approximating  the  behavior  of  /,  then  we  do  not  decrease  the  length 
of  the  steps  we  take,  and  thereby  avoid  unnecessarily  restricting  the  progress  of  the  optimization. 

The  trust  region  algorithm  for  unconstrained  minimization  using  general  first-order  approximation  mod¬ 
els  is  then  given  in  Figure  3,  again,  omitting  the  stopping  criteria.  In  §4  we  explain  precisely  what  we  mean 

Choose  xo  e  2Rn,  Ao  >  0. 

For  h  —  0, 1, . . .  until  convergence  do  { 

Choose  ak  that  satisfies  ak(x k)  =  f{xk)  and  grad  a*; (a;*)  =  grad  f(xk). 

Find  an  approximate  solution  sk  (for  example,  as  in  Figure  4) 
to  the  subproblem: 

minimize  ak(xk  +  5) 
subject  to  ||  s  ||  <  Afc. 

Compare  the  actual  and  predicted  decrease  in  /: 

_  f{xk)  -  f{Xk  +  Sk) 
f(xk)  -  ak{xk  +  sky 

Update  xk  according  to  (3)  and  Ak  according  to  (4). 

} 

Fig.  3.  A  trust  region  algorithm  using  general  approximation  models. 


by  finding  an  approximate  solution  sk  to  the  subproblem. 

Note  that  the  approximation  need  not  be  fixed  across  all  iterations;  it  can  vary  with  the  iteration,  so 
we  denote  the  approximation  by  ak  rather  than  a.  One  can  change  the  approximation  model  if  necessary 
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to  reflect  the  current  region  of  the  design  space  (for  the  case  of  quadratic  models,  see  [4]).  Furthermore, 
though  only  a  single  level  of  approximation  is  described  in  Figure  3,  nothing  precludes  additional  levels 
of  approximation  in  solving  the  subproblem.  The  framework  we  propose  can  accommodate  many  levels  of 
approximation  which  vary  in  fidelity  but  also,  presumably,  in  computational  cost. 

4.  Robustness  and  convergence  theory.  Obviously,  the  practical  performance  of  any  algorithm 
along  the  lines  of  Figure  1  depends  on  the  quality  of  the  approximation  models  and  their  ability  to  predict 
the  behavior  of  /.  However,  we  can  make  some  general  mathematical  statements  about  the  analytical 
robustness  of  the  trust  region  framework  for  using  approximation  models  in  optimization. 

In  the  conceptual  algorithm  given  in  Figure  1,  if  an  optimization  iterate  is  unsuccessful,  the  user  would 
respond  by  improving  the  fidelity  of  the  model  and/or  “doing  less  optimization”.  The  trust  region  approach 
focuses  on  the  latter  option,  which  is  accomplished  by  reducing  the  trust  radius.  This  incidentally  has  the 
effect  of  “improving  the  model”  insofar  as  attention  is  restricted  to  smaller  regions  in  which  the  approximation 
is  increasingly  better.  This  strategy  enables  us  to  establish  the  robust  behavior  of  the  approach  under  the 
reasonably  mild  matching  conditions  (5)  and  (6),  as  we  now  discuss. 

The  algorithm  in  Figure  3  is  the  classical  trust  region  procedure  for  unconstrained  minimization  with 
the  distinction  that  the  trust  region  subproblem 

minimize  ak  {xk  +  s) 

^  subject  to  ||  s  ||  <  Afc 

involves  a  general  first-order  approximation  model  ak  instead  of  the  conventional  quadratic  model  qk  given 

by  (1). 

The  trust  region  algorithm  for  general  approximation  models  in  Figure  3  requires  only  that  this  subprob¬ 
lem  be  solved  approximately.  Here  “approximately”  means  that  the  solution  of  each  iteration,  Sk,  can  be 
obtained  in  any  manner  suitable  to  the  application,  as  long  as  it  satisfies  a  condition,  known  as  the  Fraction 
of  Cauchy  Decrease  (FCD)  condition,  concerning  the  change  in  the  the  model  from  the  point  Xk  to  the 
point  Xk  4*  Sfc.  Let  g(x)  =  grad  f(x).  We  state  the  FCD  condition  in  the  following  form:  there  exist  (3  >  0 
and  C  >  0,  independent  of  k,  for  which  the  step  Sk  satisfies 

(9)  f(xk)  -  ak(xk  4  3k)  >  P  ||  g{xk)  ||  min  ^Afc,  . 

Roughly  speaking,  (9)  says  that  we  require  the  approximation  to  predict  some  fraction  of  the  improvement 
in  /  that  is  predicted  by  the  minimum  of  the  linear  model  of  /  restricted  to  the  trust  region. 

Moreover,  a  consequence  of  (9)  is  that  we  need  not  strictly  enforce  the  trust  region  constraint.  The 
length  of  the  step  Sk  is  acceptable  if  ||  s  ||  <  a  A  k  for  a  >  1  independent  of  k.  See  [17]  for  a  more  general 
discussion  of  the  FCD  condition. 

The  FCD  condition  (9)  is  very  mild  and,  typically,  trust  region  algorithms  automatically  satisfy  this 
condition  by  design.  In  Figure  4  we  give  an  algorithm  for  solving  the  subproblem  (8)  that  satisfies  the  FCD 
condition.  This  algorithm  for  solving  the  subproblem  is  itself  based  on  the  classical  trust  region  approach 
using  a  local  quadratic  approximation. 

The  algorithm  in  Figure  4  for  computing  the  step  Sk  is  a  sequence  of  classical  trust  region  iterations 
applied  to  the  approximation  model  ak  as  the  unconstrained  objective.  Algorithms  for  the  exact  and  ap- 
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Given  xk  €  IRn»  >  0,  r  6  (0, 1),  and  a  >  1,  set  yo  —  xk,  —  tA*,  vo  =  0. 

For  j  —  0, 1, . . .,  while  ||  s  ||  <  aA k  and  at  least  until  Vj  ^  0  do  { 

Construct  a  quadratic  model  qj(yj  +p)  =  afc(yj)  +  gradafc(?/j)rp  +  \pTBjp, 
where  Bj  approximates  the  second  order  information  for  a k  at  yj . 

Find  an  approximate  solution  pj  to 
minimize  <7j  (%  +  p) 

subject  to  ||  p  ||  <  Sj 

||  yj  +  P  ||  <  A k 

that  satisfies  FCD  for  from  yj. 

Compare  the  actual  and  predicted  decrease  in  a^: 

=  ak{Vj)  -afcfaj+Pj) 

ak{yj)  -  qj(yj  +pj)' 

Update  yj  according  to  (10)  and  Sj  according  to  (4). 

Set  vj+l  =  vj  +  (yj+ 1  -  yj). 

} 

Set  Sfc  =  Vj. 


Fig.  4.  A  trust  region  algorithm  for  computing  Sfc  approximately. 


proximate  solution  of  the  subproblem 


minimize  qj  (yj  +  p) 
subject  to  ||  p  ||  <  Sj 

II  Vj+p  ||  <  a k 


are  discussed  in  [16]. 

We  use  a  slightly  more  stringent  rule  for  updating  yj  than  we  used  for  updating  Xk  in  (3).  We  choose 
(i  >  0,  independent  of  j,  and  update  as  follows: 


(10) 


If  yj  =  xk ,  then  yj+ x  = 
If  yj  ^  xk ,  then  yj+ 1  = 


Vj  +Pj 

if  r  >  pt 

Vj 

otherwise. 

Vj  +  Pj 

if  r  >  0 

Vj 

otherwise. 

This  rule,  a  slight  modification  of  one  frequently  encountered  in  both  the  theoretical  analysis  and  practical 
implementation  of  trust  region  methods,  insures  that  the  solution  of  the  subproblem  (8)  satisfies  the  FCD 
condition  for  /  at  Xk :  we  do  not  accept  a  step  from  Xk  until  r  >  p>  0  (we  are  guaranteed  that  eventually  we 
will  find  such  a  step  since,  if  we  do  not,  we  reduce  the  trust  region,  and  ultimately  a  successful  step  will  be 
found).  Let  p^  be  that  first  acceptable  step.  Since  the  step  generated  by  the  classical  trust  region  algorithm 
applied  to  dk  satisfies  the  FCD  condition  for  at  Xfc,  and  r  >  pt,  we  have 


ak(xk)  -  ak(xk  +Pn)  >  p(dk(xk)  -  qN(xk  +Pa0) 
>  ftp  ||  gra dak(xk)  ||  min  -  grad^.^^.ll^  ) 


which,  in  light  of  (5)  and  (6),  yields 


f(xk)  -  ak(xh  +  pN)  >  jip  ||  grad  f(xk)  ||  min  SN 


grad  f(xk) 
C 
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If  we  place  the  additional  requirement  that  the  Hessians  grad2  dk{x+s)  of  the  approximations  ajt  are  bounded 
for  all  s  such  that  ||  s  ||  <  uniformly  in  k ,  then  we  can  be  assured  that  there  exists  7  independent  of  k 
for  which  Sjy  >  7A*.,  so  we  arrive  at  an  FCD  condition  for  ak  as  an  approximation  of  /  at 

f(xk)  -  ak(xk  +  pN)  >  7/i/3  II  grad  f(xk)  ||  min  (a*,  -  gradAX-d 

Since  any  steps  after  step  N  only  produce  further  decrease  in  a^,  the  step  generated  by  the  algorithm  in 
Figure  4  produces  a  fraction  of  Cauchy  decrease  for  ak  as  an  approximation  of  /  at  Xk. 

Because  the  convergence  analysis  of  our  algorithm  is  virtually  identical  to  the  analysis  of  classical  trust 
region  methods,  we  give  only  an  outline  here.  Powell’s  global  convergence  theorem  [21]  is  a  powerful  result 
that  provides  simple  conditions  for  analyzing  all  trust  region  algorithms.  The  theorem  states  that  if  / 
is  bounded  below,  uniformly  continuously  differentiable,  and  the  Hessian  approximations  Bk  in  (1)  are 
uniformly  bounded,  then  the  sequence  of  iterates  generated  by  a  classical  trust-region  algorithm  whose  steps 
satisfy  a  FCD  condition  satisfies 

liminf  ||  grad/(xfc)  ||  =  0. 

k—>oo 

If  one  uses  a  step  acceptance  criterion  of  the  form  (10)  instead  of  (3)  in  the  classical  algorithm,  one  has 

lim  ||  grad/(xfc)  ||  =  0. 

k—*oo 

Since  the  algorithm  that  we  propose  to  find  steps  Sk  satisfies  a  FCD  condition  for  the  models  (Figure  4), 
similar  results  hold  for  our  algorithm  under  the  hypothesis  that  the  Hessians  grad2a/c(x  -f  $)  are  bounded 
uniformly  in  k  for  all  s  such  that  ||  s  ||  <  A*. 

With  second-order  models  one  can  devise  algorithms  with  convergence  properties  like  those  found  in 
[25,  26].  These  classes  of  algorithms  insure  convergence  to  points  at  which  the  Hessian  of  /  is  positive 
semi-definite,  the  second-order  necessary  condition  for  minimality,  and  convergence  to  a  local  minimizer  if 
/  is  locally  convex  around  such  a  point. 

Unlike  the  trust  region  approach,  the  line-search  strategy  [8,  13]  does  not  generalize  in  a  straightforward 
way  to  non-quadratic  approximation  models.  If  the  current  iterate  is  point  A,  and  a  line-search  method 
applied  to  the  approximation  ak  visits  first  the  point  B  and  then  C,  we  know  that  the  direction  from  A  to 
B  is  a  descent  direction  for  /,  and  that  the  direction  from  B  to  C  is  a  descent  direction  for  a&,  but  there  is 
no  guarantee  that  the  direction  from  A  to  C  is  a  descent  direction  for  /.  Thus  any  backtracking  mechanism 
associated  with  a  line-search  method  will  not  necessarily  produce  improvement  as  one  takes  shorter  steps 
if  the  search  direction  is  decided  by  minimizing  an  approximation  model,  since  there  is  no  guarantee  that 
such  a  search  direction  points  in  a  direction  of  descent  in  a  neighborhood  of  Xk ■  One  would  need  to  store 
the  intermediate  iterates  in  the  approximation  of  ak  and  perform  a  search  along  the  piecewise  linear  path 
defined  by  those  points. 

5.  Relationship  to  model  management.  The  principles  that  underlie  the  update  of  the  trust  radius 
in  the  classical  trust  region  method  were  presented  in  §2.  We  now  observe  that  these  principles  may  suggest 
ideas  for  deciding  when  a  model  of  greater  or  lesser  fidelity  may  be  more  suitable.  These  observations  are 
based  on  the  fact  that  in  the  classical  trust  region  approach,  the  approximation  model  was  always  quadratic, 
and  the  only  way  of  “improving”  the  model  was  to  reduce  the  trust  radius,  thereby  restricting  attention  to 
a  region  where  the  quadratic  model  is  a  better  approximation.  In  the  case  of  general  approximation  models, 
however,  we  may  have  the  option  of  changing  the  nature  of  the  model,  in  addition  to  changing  the  trust 
radius. 
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Recall  that  in  the  classical  trust  region  algorithm  given  in  Figure  2,  if  the  model  did  a  very  good  job  of 
predicting  improvement  in  /,  then  the  update  rule  for  5 k  increases  &k->  thus  allowing  longer  steps  at  the  next 
optimization  iteration  k  +  1,  since  the  model  has  proven  itself  to  be  good  over  the  current  trust  region.  In 
the  case  of  general  approximation  models,  we  also  have  the  option  of  changing  the  nature  of  the  model.  If 
the  model  is  doing  an  exceptionally  good  job  of  producing  improvement  in  the  system,  then  we  may  be  able 
to  use  a  different  model,  perhaps  one  with  less  physical  accuracy  but  lower  computational  cost. 

In  the  classical  algorithm,  if  the  model  did  a  bad  job  of  predicting  improvement  in  /,  either  because 
/  actually  increased  with  the  step  s  or  because  /  did  decrease  but  not  nearly  as  much  as  predicted  by  the 
quadratic  model,  we  decrease  the  size  of  the  trust  region  used  in  the  next  optimization  iteration,  since  this 
is  our  only  mechanism  for  “improving”  the  quadratic  model,  and  we  know  that  this  model  is  good  if  we 
remain  sufficiently  close  to  Xk .  In  the  setting  of  general  approximation  models,  the  need  to  reduce  the  trust 
region  also  indicates  that  progress  may  be  more  successful  with  a  different  choice  of  models — again,  a  model 
management  decision. 

Finally,  in  the  classical  algorithm  we  leave  the  size  of  the  trust  region  alone  if  the  model  did  an  acceptable 
but  not  especially  noteworthy  job  of  predicting  the  improvement  in  /.  For  general  approximation  models, 
this  suggests  that  the  model  remain  the  same  since  its  predictive  abilities  have  just  proven  to  be  sufficient 
and  there  is  no  strong  reason  to  believe  that  a  less  faithful  model  will  be  sufficient  for  the  purposes  of  the 
optimization. 

6.  Conclusion.  We  have  presented  a  trust  region  approach  to  the  use  of  approximation  models  in  opti¬ 
mization  with  provable  robustness  properties  in  terms  of  its  convergence  to  stationary  points  for  the  original 
problem.  Moreover  the  trust  region  idea  of  monitoring  the  improvement  predicted  by  the  approximation 
model  to  the  improvement  realized  for  the  true  system  being  optimized  may  prove  useful  in  suggesting  when 
a  model  of  greater  or  lesser  accuracy  may  be  more  suitable.  The  practical  significance  of  the  latter  point 
will  be  the  subject  of  future  investigation. 
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Appendix:  Examples  of  first-order  approximation  models.  In  this  section  we  briefly  review 
some  examples  of  approximation  models  that  satisfy  the  first-order  requirements  (5)  and  (6). 

Algebraic  approximation  models.  Early  work  in  the  development  of  approximation  concepts  for 
structural  optimization  [22]  concentrated  on  using  linear  approximations  so  that  mathematical  programming 
techniques  such  as  sequential  linear  programming  (SLP)  could  be  employed.  Linear  approximations  are  one 
instance  of  the  more  general  form 

n 

(11)  ak(x)  =  /(a:fe)  +  y^5i(a;fc)(x,  -  x\)(j>i{x\xlk), 

i= 1 


where  x  =  (x1,---,xn)  and  gi  =  dxif.  Any  approximation  of  the  form  (11)  necessarily  satisfies  cik(xk)  = 
f{xk ).  We  also  see  that 


dxiak(x) 


X—Xk 


so  gradafc(xfc)  =  grad  f{xk)  if  and  only  if  (j>i{x\,x\)  =  1. 

The  choice  <fo(x\xjj.)  =  1  yields  the  first-order  Taylor  series  approximation,  the  simplest  form  of  (11). 
An  alternative  approximation  seen  in  structural  optimization  comes  from  introducing  reciprocal  variables 
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into  the  formulation  of  the  problem.  This  transformation  is  based  on  the  observation  that  a  significant  class 
of  constraints  in  structural  engineering  can  be  transformed  from  nonlinear  to  linear  equations  by  using  the 
reciprocals  of  the  sizing  type  design  variables  (at  the  expense  of  introducing  nonlinearity  into  the  objective 
function).  This  leads  to  the  reciprocal  approximation , 

n  { 

(12)  ak{x)  =  f(xk)  +  '^2gi(xk)(xl  -  4)^f. 

t=  1 

where  , x\)  =  xk/x\ 

Early  computational  results  employing  these  two  approximations  [28]  suggested  that  the  use  of  such 
linearizations,  in  particular,  a  first-order  Taylor  series  approximation,  could  be  computationally  effective 
(and  here  it  is  noted  that  there  is  also  a  significant  computational  savings  to  be  enjoyed),  but  that  such 
approximations  were  not  always  accurate.  It  was  also  observed  that  a  significant  problem  with  the  use  of 
reciprocal  variables  is  that  the  approximation  becomes  unbounded  if  any  one  of  the  variables  approaches 
zero. 

Subsequent  work  along  these  lines  was  aimed  at  greater  accuracy  and  numerical  stability  in  the  approx¬ 
imation  model  without  requiring  the  costly  calculation  of  higher-order  derivatives.  Examples  of  such  work 
include  the  modified  reciprocal  approximation  [14], 

n  i  { 

(13)  ak(x)  =  f(xk)  +  Y^9i{xk){xl  -  3?k)X*  °  , 

X  i*  G 

t=l 

with  <j)i(x\xl)  —  {xikjrci)/{xi-\~ci)  (where  the  values  of  the  cl’s  are  typically  small  compared  to  representative 
values  of  the  corresponding  :rz’s),  and  the  conservative  approximation  [27], 

n 

(14)  ak(x)  =  -  xk)<pi(x\ x'k), 

1  =  1 


where 


<t>i(x\xlk) 


1  if  x\gi{xk)  >  0 
xk/xl  otherwise. 


The  conservative  approximation  has  the  attractive  feature  of  leading  to  a  convex  programming  problem 
and  thus  is  amenable  to  solution  by  nonlinear  programming  techniques  that  take  advantage  of  the  dual 
problem.  This  observation  has  led  to  the  development  of  a  range  of  convex  approximation  strategies  (in 
particular,  [23,  3,  10];  see  [2]  for  further  references). 

A  slightly  different  line  of  inquiry  noted  that  the  reciprocal  and  conservative  approximations  destroy  the 
linearity  of  the  problem  and  thus  the  possibility  of  using  SLP.  However,  the  posynomial  approximation  [9], 


ak{x)  —  /(z*o  n 

i—l 


with 


a  = 


f{xk) 


9i{Xk): 


can  be  treated  using  geometric  programming  techniques.  This  approach  is  studied  in  [15,  19,  20,  29].  This 
approach  is  noteworthy  in  that  it  has  an  attendant  convergence  analysis;  [1,  18]  show  that  under  appropriate 


ll 


conditions,  geometric  programming  techniques,  when  applied  to  a  posynomial  approximation  of  the  original 
problem,  converge  to  a  stationary  point  of  the  original  problem. 

Finally,  we  briefly  mention  second-order  approximation  models  of  the  form 

n 

(15)  ak{x)  =  f(xk)  +  -  xlk)<j>i{xl ,x\) 

i—l 

n  n 

+  ^2^2hij(xk)(xi  - 4)( xj  - 4)^(4  4^,  4)’ 

i=l  j—l 


where  =  1  and  ^ij{x[,x[ ’A*  A)  =  1-  The  reciprocal  quadratic  approximation  is  an  instance  of 

this  type  of  model: 


d~  'y  ^  Qi{xk){x  X k ) 


i= 1 


■b  y  ^  y  ] 

i= 1  j= 1 


X1 

4)(4-4)|§ 


xpij(x\xi,  xj,x{) 


44 

X 1  X3 


The  /^-correlation  approach.  The  ^-correlation  method  to  approximate  modeling  presented  in  [6] 
is  a  generic  approach  to  correcting  a  lower-fidelity  model  /iD,  say,  one  of  lower  physical  fidelity,  by  scaling. 
Unlike  the  models  of  the  preceding  section,  this  approach  is  not  based  on  any  specific  mathematical  form  of 
the  response  /. 

One  defines  the  scale  factor  p  to  be 


P{x) 


/( 4 

fio(x)  ' 


Given  the  current  design  xk ,  one  builds  a  first-order  model  Pk  of  p  about  xk: 


Pc(x)  =  (3(xk)  +  gr&d  p(xk)T  (x  -  xk). 


The  local  model  of  P  is  then  used  to  scale  the  lower-fidelity  model  in  order  to  derive  a  better  approximation 
of/: 

f(x)  =  p(x)a(x)  «  Pk{x)fio(x). 


It  is  straightforward  to  verify  that  the  approximation 

ak(x)  =  Pk{x)f\0{x) 


satisfies  (5)  and  (6). 

The  /3-correlation  method  can  be  extended  to  produce  second-order  approximation  models  by  using  a 
second-order  model  of  /3, 

Pk{x)  =  p{xk)  +  grad/3(xfc)r(x  -  xk)  +  t (x  -  xk)T  grad2  p(xk)(x  -  xk), 
and  the  approximation  dk{x)  =  Pk{x)f\0(x). 
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